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published, these documents reflect a tremendous amount of unique expertise, knowledge, and 
experience. 

The Working Paper Series was created in order to preserve the information contained 
in these documents and to promote the sharing of valuable work experience and knowledge. 
However, these documents were prepared under different formats and did not undergo 
vigorous NCES publication review and editing prior to their inclusion in the series. 
Consequently, we encourage users of the series to consult the individual authors for citations. 

To receive information about submitting manuscripts or obtaining copies of the series, 
please contact Ruth R. Harris at (202) 219-1831 or U.S. Department of Education, Office of 
Educational Research and Improvement, National Center for Education Statistics, 555 New 
Jersey Ave., N.W., Room 400, Washington, D.C. 20208-5654. 



Susan Ahmed 

Chief Mathematical Statistician 
Statistical Standards and 
Services Group 



Samuel S. Peng 
Director 

Methodology, Training, and 
Service Program 



STATISTICS FOR POLICYMAKERS 



or 
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Introduction 



This working paper contains overheads used in a seminar developed by Susan Ahmed, NCES 
Chief Statistician. The seminar, titled "Statistics for Policymakers or Everything You Wanted 
to Know About Statistics But Thought You Could Never Understand," is designed to 
introduce some basic concepts of statistics to nonstatisticians. There are two main parts to the 
seminar. The first covers basic statistical concepts; the second covers some basic principles 
of research design and analysis. 

Dr. Ahmed has presented the seminar to policymakers at the Department of Education, at an 
NCES Summer Data Conference, to newspaper reporters at the Baltimore Sun, to education 
writers at two Education Writers Association Annual Meetings, at the 1997 annual meeting 
of the National Commission of State Legislatures, and as the key note address at the 1997 
meeting of state library data coordinators. 



I. Essentials of Statistics 

A. Population, Sample, and Inference 

B. Standard Errors and Confidence Intervals 

• What are they and why are they important? How do you interpret them? 

C. Statistical Significance 



What does it mean when a result is statistically significant? 

What is the difference between statistical and substantive significance? 

Can a result not be statistically significant and still be noteworthy? If a result 
is statistically significant, does it mean it’s true? 



What are they? How do you interpret results based upon correlation or 
regression? Can you determine causality from cross-sectional data? From 
longitudinal data? 



A discussion of how graphics can both mislead and enlighten the reader of 
statistical reports. Pitfalls in interpreting graphics. 

The importance of skepticism. 



n. Some Basic Principles of Research Design and Analysis 

A. Operationalizing Your Terms 

B. Selections Bias 

C. Need for Control Group 

D. Nonresponse Bias 

E. Confounding 

F. Validity 

G. Reliability 

H. Generalizing/Extemal Validity 



Essentials of Statistics and Analysis: An Overview 



D. Correlation and Linear Regression 



E. Graphics 




I. ESSENTIALS OF STATISTICS 



EVALUATION OF CLAIMS MADE ABOUT DATA: 

- W HEN TO BELIEVE THEM 

- WHEN TO BE SKEPTICAL 

- WHEN TO IGNORE THEM 



SCHR0EDE6. U)HV 
DON'T YOU 6N E UP 
THIS CLASSICAL 
MUSIC THINS? 






DON'T YOU KNOW THEBE ACE 
OVER ElSHTY MIUION Pi ANO 
STUDENTS IN THIS COUNTRY? 



AND LESS THAN OMB PERCENT 
OF THEM EVER MACE A BEAL 

-t livins at nTj 
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POPULATION AND SAMPLE 
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FREQUENCY DISTRIBUTION 



EXAMPLE: CLASS OF H.S. BOYS LINED UP FROM SHORTEST TO 
TALLEST 



The raw material of a frequency distribution 



60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 



BELL CURVE/NORMAL DISTRIBUTION 



A perfect bell curve 



J^S^ TI0N: A “EASUEE OF variability, almost 

L]KE AN AVERAGE DISTANCE FROM THE MEAN. ACTUALLY 
SQRT OF AVERAGE SQUARED DISTANCE FROM THE MEAN. 



A bell curve cut into standard deviations 




68% OF THE POPULATION LIES WITHIN 
95% OF THE POPULATION LTF.fi WITHIN 
99% OF THE POPULATION LIES WITHIN 



+/- 1 STD DEV 
+/- 2 STD DEVS 
+/- 3 STD DEVS 



COMPARING STANDARD DEVIATIONS: 



Standard deviations cut off the same portions of the population for 
any normal distribution 




E.G. HEIGHTS OF WOMEN GYMNASTS AND HEIGHTS OF 
BASKETBALL PLAYERS: 

MEANCWG) = 61” SD=2" 

MEAN (BP) = 78" SD=4" 

WHICH OF THE FOLLOWING IS MORE UNUSUAL? 

A 66" WG OR A 84" BP? 

WG = (66-61)/2 = 2.5 (2.5 SDs ABOVE MEAN) 

BP = (84-78)/4 = 1.5 (1.5 SDs ABOVE MEAN) 

THE WG IS MORE UNUSUAL THAN THE BP. 



POTENTIAL CLAIMS 



1. THE ONE YEAR ATTRITION RATE AMONG 
VOC ED TEACHERS IN PRIVATE SCHOOLS IN 
1990-91 WAS 44%. THE RATE FOR ALL PRIVATE 
SCHOOL TEACHERS WAS 12%. 



2. BLACK EIGHTH GRADERS AND WHITE 
EIGHTH GRADERS DIFFER IN MATH 
ACHIEVEMENT SCORES. 



3. THERE IS A NEGATIVE ASSOCIATION 
BETWEEN TV WATCHING AND ACHIEVEMENT 
SCORES. 



SAMPLING DISTRIBUTION/SAMPLING VARIABILITY 




O 

ERIC 



? 



my sample 



QUESTIONS TO BE ASKED BEFORE 
ACCEPTING AN ESTIMATE OR A CLAIM 



1. SINCE THIS ESTIMATE IS BASED ON ONE 
SINGLE SAMPLE AMONG MANY THAT MIGHT 
HAVE BEEN DRAWN, AND KNOWING THAT 
DIFFERENT SAMPLES WOULD MOST LIKELY 
PRODUCE DIFFERENT ESTIMATES, HOW 
COMFORTABLE CAN I FEEL WITH THIS 
RESULT? 



in 

HOW MUCH WOULD ESTIMATES FROM 
DIFFERENT SAMPLES VARY? 

(STANDARD ERROR) 

HOW CERTAIN CAN I BE ABOUT THIS 
ESTIMATE? WHAT IS THE MARGIN OF 
ERROR? HOW FAR OFF COULD I BE? 



(CONFIDENCE INTERVALS) 




2. IN MAKING A STATEMENT COMPARING 
TWO GROUPS OR ABOUT THE ASSOCIATION 
BETWEEN TWO VARIABLES, DOES THE 
EVIDENCE PROVIDED BY THE DATA SUPPORT 
THE STATEMENT? 



Ill 

HOW DO WE PROVE OR DISPROVE A 
HYPOTHESIS REGARDING GROUP 
DIFFERENCES OR ASSOCIATIONS? 

(HYPOTHESIS TESTING) 

COULD THE DIFFERENCE OR THE 
ASSOCIATION WE ARE SEEING BE DUE 
TO CHANCE? 

(STATISTICALLY SIGNIFICANT) 



3. HOW CAN WE DISPLAY OUR RESULTS 
HONESTLY? 



(MISLEADING GRAPHS) 



QUESTION 1 



SINCE THIS ESTIMATE IS BASED ON ONE 
SINGLE SAMPLE AMONG MANY THAT MIGHT 
HAVE BEEN DRAWN, AND KNOWING THAT 
DIFFERENT SAMPLES WOULD MOST LIKELY 
PRODUCE DIFFERENT ESTIMATES, HOW 
COMFORTABLE CAN I FEEL WITH THIS 
RESULT? 



in 



QUESTION 1A 

HOW MUCH WOULD ESTIMATES FROM 
DIFFERENT SAMPLES VARY? 



(STANDARD ERROR) 




STANDARD ERROR: MEASURE OF THE 
VARIABILITY OF A STATISTIC 




true mean 

small std error: all sample means 
are tightly grouped around true mean. 



x 



x 



x 

x x xxx x x 

xxx xxx xxx 





X 



true mean 

large standard error: sample means 
are widely spread around true mean. 



x 



95% of all sample means will lie within 2 std 
errors of the true mean 
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WHAT AFFECTS THE SIZE OF THE STANDARD 

ERROR? 



The standard error is affected by 

(1) the amount of variability of the 
measurement in the population 

(2) the sample size 



less variability -*■ smaller std error 
larger sample size smaller std error 



A1. — Standard errors for attrition rates from the teaching profession, by main field of 
assignment: 1987-88 to 1988-89 and 1990-91 to 1991 -92 (table 1) 



Public 

1987-88 1990-91 



Private 

1987-88 1990-91 



Total 


0.30 


0.36 


0.85 


0.80 


Kindergarten 


0.69 


1.56 


2.65 


2.74 


General elementary 


0.64 


0.61 


1.23 


1.28 


Art/music 


0.79 


1.44 


4.38 


3.26 


Bilingual/ESL 


3.11 


2.04 


— 


— 


Business 


2.27 


3.64 


24.45 


7.65 


English/language arts 


1.76 


1.09 


3.38 


3.12 


Health 


0.81 


0.85 


2.99 


4.37 


Home economics 


2.35 


1.08 


19.44 


— 


Industrial arts 


1.27 


0.87 


— 


— 


Math 


0.74 


1.29 


2.64 


2.89 


Reading 


1.25 


1.22 


3.13 


13.49 


Social studies 


1.73 


1.22 


2.86 


3.66 


Science total 


1.21 


1.96 


2.25 


2.08 


Biology 


0.94 


1.17 


5.05 


3.55 


Chemistry/physics 


2.06 


2.38 


4.12 


3.28 


General science/earth science 


2.09 


3.71 


3.75 


3.05 


Special education total 


1.23 


0.93 


9.21 


3.95 


Mentally retarded 


4.24 


1.72 


15.84 


— 


Learning disabled 


0.65 


0.92 


10.34 


2.57 


Other special education 


2.51 


1.26 


18.13 


6.91 


Vocational education 


2.47 


1.67 


0.00 


30.80 


Foreign languages 


+ + 


0.44 


+ + 


3.69 


All others* 


0.78 


1.01 


3.64 


3.03 



- -Too few cases for a reliable estimate. 

‘Includes computer science, remedial education, religion, gifted, prekindergarten, and all 
others (and foreign languages in 1987-88). 

++ Foreign languages in 1987-88 was included in the "All others" category. 

SOURCE: U.S. Department of Education, National Center for Education Statistics, Teacher 
Followup Survey, 1988-89 and 1991 -92. 



QUESTION IB 



HOW CERTAIN CAN I BE ABOUT THIS 
ESTIMATE? WHAT IS THE MARGIN OF 
ERROR? HOW FAR OFF COULD I BE? 



(CONFIDENCE INTERVALS) 



INTERPRETATION OF A CONFIDENCE INTERVAL 



EXAMPLE: IN THE CONDITION OF EDUCATION, 
INDICATOR 13 PRESENTS THE FOLLOWING 
DATA FOR NAEP MATH SCORES FOR EIGHTH 
GRADERS: 

BLACKS: MEAN-249 SE-2.3 



A 95% CONFIDENCE INTERVAL IS AN INTERVAL 
CONSTRUCTED IN SUCH A WAY THAT YOU CAN 
BE 95% CONFIDENT THAT THE VALUE FOR 
THE WHOLE POPULATION FALLS IN THE 
INTERVAL. 



A 95% CONFIDENCE INTERVAL WOULD BE 
CALCULATED AS FOLLOWS: 



estimate +/- 1.96(se) 

249 */- (1.96X2.3) ■ 249 ♦/- 




(margin of error) 




• (244.5, 253.5) 



NTERPRETATION: 

WE ARE 95% CONFIDENT THAT THE INTERVAL 
(244.5, 253.5) INCLUDES THE TRUE AVERAGE 
NAEP SCORE FOR ALL BLACK EIGHTH GRADERS. 



WHAT DOES "95% CONFIDENT' MEAN? 
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because if we were to draw all possible samples 

of the same size and construct confidence intervals for each 

then 95% of the intervals would include the true mean. 



Sources of Data 



General Information 

The information presented in this report was 
obtained from many sources, including federal 
and state agencies, private research 
organizations, and professional associations. The 
data were collected using many research 
methods including surveys of a universe (such 
as all school districts) or of a sample, 
compilations of administrative records, and 
statistical projections. Users of The Condition of 
Education should take particular care when 
comparing data from different sources. 
Differences in procedures, timing, phrasing of 
questions, interviewer training, and so forth 
mean that the results are not strictly comparable. 
Following the general discussion of data 
accuracy below, descriptions of the information 
sources and data collection methods are 
presented, grouped by sponsoring organization. 
More extensive documentation of procedures 
used in one survey than in another does not 
imply more problems with the data, only that 
more information is available. 

Unless otherwise noted, all comparisons dted in 
the text were tested for significance using t-tests 
and are significant at the .05 level. However, 
when multiple comparisons are dted, a 
Bonferroni adjustment to the significance level 
was made. When other tests were used, they 
are described in a note on the indicator page or 
in the supplemental note for the indicator. 

The accuracy of any statistic is determined by 
the joint effects of "sampling" and "nonsampling" 
errors. Estimates based on a sample will differ 
somewhat from the figures that would have 
been obtained if a complete census had been 
taken using the same survey instruments, 
instructions, and procedures. In addition to 
such sampling errors, all surveys, both universe 
and sample, are subject to design, reporting, and 
processing errors and errors due to nonresponse. 
To the extent possible, these nonsampling errors 
are kept to a minimum by methods built into the 
survey procedures. In general, however, the 
effects of nonsampling errors are more difficult 
to gauge than those produced by sampling 
variability. 



The estimated standard error of a statistic is a 
measure of the variation due to sampling and 
am be used to examine the precision obtained in 
a particular sample. The sample estimate and 
an estimate of its standard error permit the 
construction of interval estimates with 
prescribed confidence that the interval includes 
the average result of all possible samples. If all 
possible samples were selected, each of these 
surveyed under essentially the same conditions, 
and an estimate and its standard error were 
calculated from each sample, then approximately 
90 percent of the intervals from 1.6 standard 
errors below the estimate to 1.6 standard errors 
above the estimate would include the average 
value from all possible samples; 95 percent of 
the intervals from two standard errors below the 
estimate to two standard errors above the 
estimate would include the average value of all 
possible samples; and 99 percent of all intervals 
from 2.5 standard errors below the estimate to 
2.5 standard errors above the estimate would 
include the average value of all possible 
samples. These intervals are called 90 percent, 

95 percent, and 99 percent confidence intervals, 
respectively. 



To illustrate this further, consider the text table 
for indicator 1 and table 1-2 for estimates of 
standard errors from Census Current Population 
Surveys. For the 1991 estimate of the percentage 
of 3-year-olds enrolled in school (28.2 percent), 
supplemental table 1-2 shows a standard error of 
12. Therefore, we can construct a 95 percent 
confidence interval from 30.6 to 25.8 (28.2 ± 2 x 
12). If this procedure were followed for every 
possible sample, about 95 percent of the 
intervals would include the average for all 
possible samples. 

Standard errors can help assess how valid a 
comparison between two estimates might be. 

The standard error of a difference between two 
sample estimates is approximately equal to the 
square root of the sum of the squared standard 
errors of the estimates. The standard error (se) 
of the difference between sample estimate "a" 
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lA.3 ± /.9< (o.Se) s. (to. 7, 13.9) 

Table 1. — Attrition rates from the teaching profession, by main field of assignment: 

1987—88 to 1988-89 and 1990-91 to 1991 -92 



Public 

1987-88 1990-91 



?rivate^ 
1987-88 1990-91 



Total 


5.6 


5.1 


12.7 


^12.3^ 


Kindergarten 


3.1 


4.0 1 


10.5 


11.9 


General elementary 


5.6 


5.3 


11.9 


10.4 


Art/music 


4.2 


5.9 


17.7 


13.0 


Bilingual/ESL 


8.2 1 


4.5 1 


— 


— 


Business 


5.9 1 


7.7 1 


21.1 2 


10.7 2 


English/language arts 


8.5 


5.1 


18.7 


13.9 


Health 


3.8 


3.3 


6.3 1 


15.6 


Home economics 


6.6 1 


4.2 


31.7 2 




Industrial arts 


3.7 1 


2.7 1 


— 


_ — > 


Math 


4.9 


5.2 


10.8 


10.9 


Reading 


5.1 


3.4 1 


6.7 1 


31.8 1 


Social studies 


5.1 1 


6.7 


8.4 1 


10.8 1 


Science total 


5.4 


6.1 1 


9.2 


7.3 


Biology 


3.2 


3.7 1 


8.5 2 


6.6 2 


Chemistry/physics 


4.1 1 


4.4 2 


7.0 2 


7.7 1 


General science/earth science 


7.1 


8.0 1 


10.9 1 


7.5 1 


Special education total 


7.3 


4.9 


13.7 2 


9.4 1 


Mentally retarded 


12.6 1 


3.7 1 


6.4 2 


— — 


Learning disabled 


4.3 


3.2 


7.6 2 


3.4 2 


Other special education 


8.4 1 


5.8 


23.7 2 


13.5 2 


Vocational education 


6.7 1 


5.6 1 


0.0 


(44.1 2 J 


Foreign languages 


+ + 


2.3 


+ + 


14.1 


All others 3 


5.2 


4.8 


18.2 


19.0 



— Too few cases for a reliable estimate. 

++ Foreign languages in 1987—88 was included in the a AII others* category. 

Coefficient of variation between 30% and 50%. 

Coefficient of variation greater than 50%. 

includes computer science, remedial education, religion, gifted, prekindergarten, and all 
others (and foreign languages in 1987-88). 

NOTE: The attrition rate is the percentage of teachers who left the teaching profession 
between school years 1987-88 to 1988-89 and 1990-91 to 1991 -92 (percent ‘leavers*). 



4 



4 



4 



4 



SOURCE: U.S. Department of Education, National Center for Education Statistics, 

Teacher Followup Survey, 1988-89 and 1991 -92. 4 
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Standard errors for attrition rates from the teaching profession, by main field of 
assignment: 1987-88 to 1988-89 and 1990-91 to 1991-92 (table 1) 



Public 



1987-88 1990-91 



Private 



1987-88 1990-91 



Total 


0.30 


0.36 


0.85 


0.80 


Kindergarten 


0.69 


1.56 


2.65 


2.74 


General elementary 


0.64 


0.61 


1.23 


1.28 


Art/music 


0.79 


1.44 


4.38 


3.26 


Bilingual/ESL 


3.11 


2.04 






Business 


2.27 


3.64 


24.45 


7.65 


English/language arts 


1.76 


1.09 


■ 3.38 


3.12 


Health 


0.81 


0.85 


2.99 


4.37 


Home economics 


2.35 


1.08 


19.44 




Industrial arts 


1.27 


0.87 






Math 


0.74 


1.29 


2.64 


2.89 


Reading 


1.25 


1.22 


3.13 


13.49 


Social studies 


1.73 


1.22 


2.86 


3.66 


Science total 


1.21 


1.96 


2.25 


2.08 


Biology 


0.94 


1.17 


5.05 


3.55 


Chemistry/physics 


2.06 


2.38 


4.12 


3.28 


General science/earth science 


2.09 


3.71 


3.75 


3.05 


Special education total 


1.23 


0.93 


9.21 


3.95 


Mentally retarded 


4.24 


1.72 


15.84 




Learning disabled 


0.65 


0.92 


10.34 


2.57 


Other special education 


2.51 


1.26 


18.13 


6.91 


Vocational education 


2.47 


1.67 


0.00 


30.80 


Foreign languages 


+ + 


0.44 


+ + 


3.69 


All others* 


0.78 


1.01 


3.64 


3.03 



•Includes computer science, remedial education, religion, gifted, prekindergarten, and all 
others (and foreign languages in 1987-88). 

■f +Foreign languages in 1987—88 was included in the "All others" category. 



SOURCE: U.S. Department of Education, National Center for Education Statistics, 
Followup Survey, 1988-89 and 1991-92. 
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5. Sampling can go wrong 
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90 s, in Poll: A Good Life Amid Old Ills 



MICHAEL R. KAGAY 



As Americans look to the year 
2000, most of them anticipate a bet- 
ter life for themselves, but at the 
same time they foresee a worsening 
of many of the nation’s social and 
economic problems, according to a 
new Gallup Poll. 

Seventy-seven percent of the 
1,234 adults polled said they ex- 
pected the overall quality of their 
own life to be better by 2000. Simi- 
larly, 77 percent anticipated that 
their family life would be better in 10 
| years’ time. Seventy-four percent 

said their financial situation would 
, be better. Eighty-two percent of em- 
ployed adults also predicted their job 
situation would improve in 10 years. 

Somewhat smaller majorities of 
! Americans also anticipated that by 
| 2000 people would be spending 



more time on leisure and recreation 
(68 percent), and more time with 
their families (58 percent). A minor- 
ity said people would be spending 
more time on jobs (38 percent) or 
household chores ( 1 3 percent). 

The poll, conduc ted by telephone 
Nov. 16-19 had almargin of sam- 
pling errorot plus or minus lour per 
centage points. 

The participants 7 optimism about 
their own lives was accompanied by 
a more pessimistic outlook on many 
current social and economic prob- 
lems. Large majorities expected by 
2000 to see increases in the rate of 
inflation (74 percent), the crime rate 
(71 percent), poverty (67 percent), 
homelessness (62 percent), and envi- 
ronmental pollution (62 percent). 



Copyright © 1990 by the New York Times Company. Reprinted by permission. 



of Section 5 are some questions that should be answered by a 
careful account of a sample survey. Which of these questions does 
this newspaper report answer, and which not? Give the answers 
whenever the article contains them. 

1.44. Market research is sometimes based on samples chosen from tele- 
phone directories and contacted by telephone. The sampling 
frame therefore omits households having unlisted numbers and 
those without phones. 

(a) What groups of people do you think will be underrepre- 
sented by such a sampling procedure? 

(b) How can households with unlisted numbers be included in 
the sample? 

(c) Can you think of any way to include in the sample house- 
holds without telephones? 

1.45. We have seen that the method of collecting the data can influence 
the accuracy of sample results. The following methods have been 
used to collect data on television viewing in a sample household: 
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Distribution of scale scores on reading literacy assessment, by age and country: 

School year 1991-92 



Age 9, Narrative domain 

100 200 300 400 500 600 700 800 

Average 
score: 



▲ 



Below the 
U.S. 



T 




100 



100 

France 
United States 
Italy 

West Germany 
Canada (BC) 
Spain 



200 300 400 500 600 700 800 



E 1st to 10th 
percentile 




Average scale score 
+/-2 standard errors 



□ 90th to 99th 
percentile 







lA'TSlL'JAL- 



Age 14, Expository domain 

200 300 400 500 600 700 800 

Average 

~~ score: 




100 



200 300 400 500 600 700 



1st to 10th 
percentile 




□ 90th to 99th 
percentile 



800 



NOTE: The vertical lines at ability score 500 marks the average score for each age group for all participating 
countries. The standard deviation is 100. 



SOURCE: biter national Association for the Evaluation of Educational Achievement. Study of Reading Literacy. 
How in the Vitorid Do Students Read?. 1992. 
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QUESTION 2 



IN MAKING A STATEMENT COMPARING 
TWO GROUPS OR ABOUT THE ASSOCIATION 
BETWEEN TWO VARIABLES, DOES THE 
EVIDENCE PROVIDED BY THE DATA SUPPORT 
THE STATEMENT? 



Ill 

HOW DO WE PROVE OR DISPROVE A 
HYPOTHESIS REGARDING GROUP 
DIFFERENCES OR ASSOCIATIONS? 

(HYPOTHESIS TESTING) 

COULD THE DIFFERENCE OR THE 
ASSOCIATION WE ARE SEEING BE DUE 
TO CHANCE? 

(STATISTICALLY SIGNIFICANT) 
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HYPOTHESIS TESTING 



EXAMPLE 



NULL HYPOTHESIS 

H 0 : THERE IS NO DIFFERENCE IN 
AVERAGE MATH ACHIEVEMENT 
SCORES OF BLACK AND WHITE 
EIGHTH GRADERS. 



ALTERNATIVE HYPOTHESIS 



H a : THERE IS A DIFFERENCE IN 
THE AVERAGE MATH 
ACHIEVEMENT SCORES OF 
BLACK AND WHITE EIGHTH 
GRADERS. 




TEST OF A HYPOTHESIS 

AN INVESTIGATION OF THE 
CREDIBILITY OF A NULL 
HYPOTHESIS. 

We collect some data on a sample 
and wish to see if these data are 
consistent with the null hypothesis. 



EXAMPLE: 



WHITES: MEAN =276 
BLACKS: MEAN =249 



Condition Indicator 13 



HYPOTHESIS TESTING 
EXAMPLE: 

INDICATOR 13 : 1990 NAEP DATA 
WHITES: MEAN =276 SE=1.1 
BLACKS: MEAN =249 SE=2.3 

Observed difference = 276-249 = 27 

Are these data consistent with the 
null hypothesis? 

How likely is it that we would get 
such a large difference if in fact the 
two population means were the 
same? 

Is this difference real or due to 
chance? 




The chances of getting such a large 
difference if the true means were the 
same is less than .001. This is the "p 
value". 

p value: the probability of getting 
an outcome at least as extreme as 
what we actually got if HO were true 

Ifp is small, the evidence against the 
null hypothesis is strong. 




3,3 
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HOW SMALL IS "SMALL"? THIS IS DECIDED 
BY THE SIGNIFICANCE LEVEL: a. 



a = CHANCE YOU ARE WILLING TO TAKE 
YOU WILL REJECT THE NULL HYPOTHESIS 
WHEN IT IS REALLY TRUE. 



IF p IS SMALLER THAN^ ^T^N^E SAY THE 
AT THE a% LEVEL. 



EXAMPLE: pc.001, «=.05, P<«. 

REJECT THE NULL HYPOTHESIS. 

CONCLUDE MEANS FOR BLACKS AND WHITES 
ARE DIFFERENT. 



STEPS IN HYPOTHESIS TESTING 

(1) SET UP THE NULL AND ALTERNATIVE 
HYPOTHESES. 

The test is designed to assess the strength ot 
the evidence against ho. Ha is a statement of the 
alternative we will accept if the evidence against 
hO is sufficiently strong. 

(2) CHOOSE THE SIGNIFICANCE LEVEL a. 

This states the chance you are willing to take 
that you will reject the null hypothesis when it is 
really true. It is an indication of how much 
evidence against HO will he decisive. 



(3) FIND THE P VALUE FOR THE OBSERVED 
DATA. 

This is the probability of getting a difference at 
least as extreme as what we got if the null 
hypothesis were true, i.e., the probability that the 
test statistic would weigh against HO at least as 
strongly as it does for these data if HO were in fact 

true. 

(4) IF THE p VALUE IS LESS THAN a, REJECT 
THE NULL HYPOTHESIS. THE RESULT IS SAID 
TO BE SIAHSHCALLY- SIGNIFICANT AT LEVEL 

a. 
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HYPOTHESIS TESTING 
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HAVING CARRIED OUT THE 
STATISTICAL TEST, 

THE STATISTICIAN WILL TELL 
YOU THE RESULTS BY SAYING 
THAT THE RESULTS ARE OR ARE 
NOT 

"STATISTICALLY SIGNIFICANT". 
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IF WE FIND: 




THAT THE DIFFERENCE IS 
STATISTICALLY SIGNIFICA 






[1 



THIS MEANS THAT 

- the null hypothesis was rejected 

- the data are not consistent with HO 

- chance is not likely to have caused the 
difference we observed 

AND THUS OUR CONCLUSION ABOUT THE 
POPULATION IS THAT 

- 'blacks and whites differ in avg math 
achievement”. 



IF WE FIND: 

(2) THAT THE DIFFERENCE IS 

NOT ST ATISTICALLY SIGNIFICANT, 

THIS MEANS THAT 

- the null hypothesis was not rejected 

- the data are not inconsistent with HO 

- chance may have caused the 
difference we observed 

AND THUS OUR CONCLUSION ABOUT THE 
POPULATION IS THAT 

- "we do not have enough evidence to 
conclude that blacks and whites differ 
in avg math achievement”. 
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WHEN WE SAY THAT CHANCE IS NOT LIKELY 
TO HAVE CAUSED THE DIFFERENCE WE ARE 
SEEING, WHAT DOES " NOT LIKELY " MEAN? 
HOW UNLIKELY IS IT? 



DETERMINED BY THE SIGNIFICANCE LEVEL 



a IS THE PROBABILITY YOU WILL REJECT 
THE NULL HYPOTHESIS WHEN IT IS TRUE, 
I.E., THE CHANCE THAT YOU WILL CONCLUDE 
THE GROUPS ARE DIFFERENT WHEN THEY 
ARE NOT. 
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WHAT DOES A STATISTICAL TEST TELL YOU? 



THE ONLY THING A STATISTICAL TEST TELLS 
YOU IS WHETHER CHANCE OR SAMPLING 
VARIABILITY IS LIKELY TO HAVE PRODUCED 
THE RESULTS YOU HAVE OBSERVED. 

A STATISTICALLY SIGNIFICANT DIFFERENCE IS 
A DIFFERENCE WHICH IS TOO LARGE TO HAVE 
OCCURRED BY CHANCE ALONE. 



STATISTICAL SIGNIFICANCE 

vs 

SUBSTANTIVE SIGNIFICANCE 
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HYPOTHESIS TESTING 



ALL DIFFERENCES CITED IN NCES 
REPORTS HAVE BEEN SUBJECTED 
TO HYPOTHESIS TESTS AND ARE 
STATISTICALLY SIGNIFICANT 
UNLESS OTHERWISE NOTED. 
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Achievement Attainment, and Curricul um 

Trends in the mathematics proficiency of 9-, 13-, and 17-year-olds 

T97?fnH a iqpn eS K 9 * 3/ mat ^® ma 1 J ics P roficie ncy improved somewhat between 

1973 and 1990, but scores for 17-year-olds showed no improvement over the same period. 

Since 1973, white, black, and Hispanic 



9-year-olds have shown improvement in 
average mathematics proficiency (10, 18, 
and 12 scale points, respectively). Most 
of this improvement occurred between 
1982 and 1990. 

In 1990 large gaps existed between the 
mathematics proficiency of whites and 
their black and Hispanic peers. 

However, for blacks the gaps were 
narrower than they had been in 1973. 

£ l a ?« Variabili ?li n “T® 8 * mathematics P«>ficiency scores across states was 
. A difference of 35 scale points existed between average eighth-grade students' 
performance in the highest and lowest scoring states (supplemental table 13-5). 



Proficiency In mathematics is an Important 
outcome of education. In an incre a singly 
technological world, the mathematics sldHs of 
the nation's workers may be a crucial 
component of economic competitiveness. In 
addition, knowledge of mathematics Is 
critical for success in science, computing, 
and a number of other related fields of study. 




Year Male 

1973 ^218 

1978 '217 

1982 '217 

1986 '*222 

1990 *229 

1 


Female 

~ '220 
'220 
'221 

'222 

*230 


Male 

'265 

'264 

269 

*270 

*271 


Female 

" * 267 

'265 
268 
268 
270 


gs sirs i| 


sransncaiiy significant difference from 1990. 



Note: Mathematics Proficiency Scale has a range horn 0 to 500 

Level 150: Simple arithmetic facts 

Level 200: Beginning skills and understandings 

Level 250 Numerical operations and beginning problem solving 

Level 300: Moderately complex procedures and reasoning 

Level 350: Multi-step problem solving and algebra 



COPY AVAILABLE 



9 







In (Mi e fas. about how 
much Imm do 
•ftfA «•«& ON 
NArf m^NO 

wtfJi lAtir wririmg? 



Nation 

1992 

19tt 



30 MiimUi 
'Las 



40 Minuus 



fOMiMMUS 



^^nfcnnt 

JL &ZZ, si A si A, si. A 



15(1.6) 259(2.1) 

30(2.5) 



40(2.0) 264(1.5) 

42(2.2) 



22|2.0^ 264(18) 



23(2.3) 265(2.1) 

11 ( 0 . 6 ) 



High Ability 

im 



271(6.6) 



36(4.7) 284(4.2) 29(4.8) 282(4.7) 26(4.2) 282(3.0) 



A ™1992 AWMty 15(2.4) 266(3.0) 45(3.1) 266(2.6) 20(2*) 263(35) 20(2.4) 269(13) 



Low Ability 
1992 



rnoS : 242(3.9) 36(3.4) 248(34) 21(3.5) 245(3.5) 23(4.8) 246(19) 



14(3.2) 262(40) 38(4.1) 265(15) 23(3.8) 266(4.0) 26(5.0) 264(3.3) 



Mixed Ability 
1992 



The ltindird error* of the unrolled J wuU.ton P rS^S^hi‘ “nS J?U'5iSwf2S» of the 

oonfiden. for tech pop* «to» of the differtoee (mo Appendix for 

SSff -T^ l^nen,' R«“^A^^^tnl method* were no, eveiUbl. in 1988 to cMcul.te -an, 

profibenoee. Pereenuge* meynot toiel 100 pereew due to rounding error. 

SOURCE: Ntdontl Aueutnent of Educeuonel Progrtu (NAEP). 1988 end 1992 Writing A-xumeatt. 



Average writing proficiency did not differ significantly by amount of writing 
instruction. Also, teachers' reports on attention to writing instruction were rdariv y 
uniform across students in classes of different ability levels, though students in high- 
ability classes apparently spent more time on writing instruction than those in Us- 
ability classes. for example, although this diffe rence was not statistically significant, 
91 versus 80 percent received an hour or more of instruction per week. 
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ASSOCIATION AND CORRELATION 



ASSOCIATION: 

The occurrence together of two or 
more characteristics or events more 
often than would be expected by 
chance. 



CORRELATION: 

A measure of the strength of 
association that assumes a linear 
relationship between the variables. 
The correlation coefficient, r, is a 
number between -1 and 1. 



Section 13.4 / The Correlation Coefficient 



165 



Figure 13.3 Examples of 
Various Values of r 




The coefficient becomes smaller and smaller as the distribution of points 
clusters less closely around the line (Figure 13.3d), and it becomes virtually 
zero (no correlation between the variables) when the distribution approxi- 
mates a circle (Figure 13.3e). Figure I3.3f illustrates one drawback of the 
correlation coefficient: it is ineffective for measuring a relationship that is 
not linear. In this case we observe a neat curvilinear relationship whose 
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CORRELATION * CAUSATION 



EXAMPLES: 

1. POLIO AND SOFT DRINKS 

2. STORKS AND BABIES 



ASSOCIATION IS NOT CAUSATION 



137 



Figure I. A misleading correlation. Soft-drink sales are correlated with 
the incidence of polio. 




• Winter 

* Summer 



Soft-drink sales 



Correlation measures association. But association is not the 
same as causation. 



Part I explained the difference between observational studies and con- 
trolled experiments. The same kind of distinction is useful here. In a labora- 
tory experiment, the investigator usually varies the independent variable on 
his own initiative, and watches the effect on the dependent variable. For 
example. Robert Hooke (England. 1653-1703) was able to determine the 
relationship between the length of a spring and the load placed on it. He just 
hung weights of different sizes on the end of a spring, and watched what 
happened. When the load was increased, the spring got longer. When the load 
was reduced, the spring got shorter. In this experiment, weight was the 
independent variable: Hooke could vary that at will. Length was the depen- 
dent variable. Hooke did not choose its value, but watched how it responded 
to weight. Since the weight was under the direct control of the experimenter. 

j s no question here about what was causing what. The weight caused the 
ERJCing to get longer. i±o_ 5? 



ASSOCIATION BETWEEN TWO CATEGORICAL VARIABLE* 



Tabic 4.8. 



—Percentage of eighth graders uho cite various probabilities for gradating 
from high school, by selected background characteristics 





Probability of Completing High School 


Student 

Characteristics 


Very Sure 
Will Graduate 


Will Probably 
Graduate 


Probably Will 
Not Graduate 


Very Sort Will 
Not Graduate 



TOTAL 


82.5 


RACE 




Asian and - 




Pacific Islanders 


77.6 


Hispanic 


70.6 


Black 


81.5 


White 


85.0 


American Indian and 




Native Alaskan 


72.1 


PARENTS* EDUCATION 




Did Not Finish High School 


68.5 


High School Graduate 


80.3 


High School Plus Some College 


83.0 


College Graduate 


88.7 


Graduate Degree 


91.3 


SES QUART I LE 




Lowest Quart ile 


71.8 


25-49% 


82.0 


50*74% 


85.1 


Highest Quartile 


91.1 


FAMILY INCOME 




Less than $15,000 


73.9 


815,000 - S50,000 


83.8 


Over $50,000 


90.7 



OLDER SIBLINGS UHO HAVE 
DROPPED OUT BEFORE GRADUATING 
None 
One 
Two 
Three 
Four 
Five 

Six or more 



84.7 

71.9 

73.4 

69.6 

60.6 
68.1 
71.7 



EVES REPEATED A GRADE _ _ 


Yes 


71.2 


No 


86.4 


DATS OF SCHOOL MISSED 
IN PAST FOUR WEEKS 


None 


86.2 


1 or 2 days 


84 .8 


3 or 4 days 


77 .4 


5 to 10 days 
More than 10 days 


74.7 


62.8 


TIICS LATE FOR SCHOOL 
IN PAST FOUR WEEKS 


None 


86.1 


1 or 2 days 


80.6 


3 or 4 days 


75.1 


5 to 10 days 
More than 10 days 


73.6 


64 . 1 



15.7 


1.1 


0.7 


21.1 


0.8 


0.4 


25.6 


2.1 


1 .4 


16.6 


1.2 


0.7 


13.6 


0.9 


0.6 


22.8 


3.0 


2.1 


25.8 


3.4 


2.4 


17.7 


1.2 


0.9 


15.7 


0.8 


0.5 


10.6 


0.4 


0.3 


8.1 


0.4 


0.1 


24.0 


2.5 


1.7 


16.3 


1.0 


0.7 


14.1 


0.6 


0.3 


8.4 


0.3 


0.2 


22.6 


1.9 


1.5 


14.8 


0.8 


0.6 


8.7 


0.4 


0.2 



14.0 


0.8 


0.5 


23.9 


2.2 


2.1 


19.9 


3.8 


3.0 


27.2 


3.2 


0.0 


31.9 


3.7 


3.7 


25.8 


3.8 


2.3 


26.2 


2.1 


0.0 


24.4 


2.6 


1.7 


12.6 


0.6 


0.4 



13.0 


0.5 


0.4 


14.2 


0.7 


0.4 


19.6 


2.0 


1.1 


21.3 


2.6 


1 .5 


27.3 


4.6 


5.3 



12.9 


0.7 


0.4 


17.5 


1.2 


0.8 


21.5 


2.2 


1.3 


21.3 


3.2 


1 .6 


27.3 


3.1 


5 .4 



SOURCE* u.S. Department of Education. National Center for Edcat ion Statistics, 

■National Education Longitudinal Study of 1988: Base Tear Student Survey. 
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CORRELATION VS CAUSATION 



CORRELATION: ARE TWO VARIABLES 
ASSOCIATED? 



CAUSATION: WILL A CHANGE IN THE 
PREDICTOR ACTUALLY CHANGE THE OUTCOME? 

CORRELATION DOES NOT IMPLY CAUSALITY! 



ESTABLISHING A CAUSAL LINK: 

1. SHOW THAT A CHANGE IN THE PREDICTOR 
PRODUCES A CHANGE IN THE OUTCOME. 

2. SHOW THAT THERE IS NO PLAUSIBLE 
ALTERNATIVE EXPLANATION. 

3. HAVE AN IDEA ABOUT WHAT MECHANISM 
IS AT WORK. 

4. REPLICATE THE STUDY IN DIFFERENT 
POPULATIONS AT DIFFERENT TIMES. 

5. STRENGTH OF ASSOCIATION. 

B. DID THE PREDICTOR PRECEDE THE OUTCOME? 
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EXPLANATIONS WHEN AN ASSOCIATION BETWEEN 
TV WATCHING AND PERFORMANCE IS OBSERVED 



Explanation 


Type of 
Association 


Basis for 
Association 


What’s really going on 
in the population? 


Chance 


Spurious 


Random error 


TV watching and performance 
are not related 


Bias 


Spurious 


Systematic error 


TV watching and performance 
are not related 


Effect-cause 


Real 


Cart before the horse 


Poor performance is a cause 
of excess TV watching 


Effect-effect 


Real 


Confounding 


Poor performance and excessive 
TV watching are both caused 
by a third extrinsic factor. 


Cause-effect 


Real 


Cause and effect 


Excess TV watching is a 
• cause of poor performance 
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REGRESSION 



• A STATISTICAL TECHNIQUE THAT IS USEFUL 
FOR STUDYING THE LINEAR ASSOCIATION 
BETWEEN A DEPENDENT VARIABLE AND ONE 
OR MORE INDEPENDENT VARIABLES. 

• REGRESSION CAN 

- MEASURE THE DEGREE OF 
ASSOCIATION 

- MEASURE THE STATISTICAL 
SIGNIFICANCE OF AN ASSOCIATION 

- MEASURE THE EXTENT TO WHICH THE 
ASSOCIATION EXPLAINS THE VARIATION 
IN THE OUTCOME 

- SERVE AS A BASIS FOR PREDICTION 

- ASSESS THE RELATIVE IMPORTANCE OF 
SEVERAL PREDICTORS 

- ASSESS THE EFFECT OF ONE 
PREDICTOR, CONTROLLING FOR 
OTHERS 
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Ri|iiuioo and Prediction 




Figure 8.1 The prediction of freshman GPA at Alpha 
College from SA T scores . 



of predictive error. In the next three sections we will examine what is 
meant by the line of “best fit" and learn how to use the formula for that 
e in making predictions. Then, we will learn how to attach a margin of 
error to our predictions. Finally, we will discover that our newly acquired 
knowledge provides another basis for interpreting the correlation coeffi- 



8.2 THE LINE OF BEST FIT 

It is all very well to speak of finding the straight line of best fit to the data, 

l °*rr ? hen the Mbest fit ” *** been achieved? Indeed, 
ld “ several wa >' s Let’s look at the way that ap- 

Sise Z h ™r™ Pearson r as the index of association and when our pm- 
pose is prediction. K 

v* Y represent lhc actual score value of the variable to be pre- 

t 115 C0rrCSP0nding P redicted va ^ < X will continue 

to represent the predictor variable). Then, an error of prediction is the dis- 
crepancy between the actual and predicted values: 

error = (Y - V) 
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Cftmate, Classrooms, and Di versity in Educatio nal Institutions 

Crime in the schools 



Between 1976 and 1991, blacks were 
both more likely to be theatened with 
and more likely to be injured with a 
weapon in school than whites. In 1991 
for example, about 1 in 10 black and 
about 1 in 19 white high school seniors 
reported being injured with a weapon 
at school. However, there were few 
other differences in the in-school 
victimization rates of black and white 
high school seniors over this period. 



^search on effective schools has identified a 
safe and orderly environment as a 
prerequisite for promoting student academic 
} ac * of school safety can reduce 
school effectiveness, inhibit student leamina 
and place students who are already at risk ' 
for school failure for other reasons in further 
jeopardy, in recent years, educators and 
po/rcyma/c©/s have voiced growing concern 

**5^5 increases in the incidences of 
school-r elated criminal behavior. 



► For blacks, in most crime categories, 

^metlTi^g^in^^iem d"cT* f ° r 

victimization. ® nes whltes did experience some increase in 

" reported ‘type'of most frequently 

The least frequently reported type of victimizating 6 ^ 11 ? stolen (a PP r °ximately 4 in 10). 
weapon (neariy 1 in 19k ^ With a 

damaged or that they were threatened without a weapon. pr ° perty had been deli berately 
victimization, ond vic,imized ln school, by type of 



Something 
stolen 
from you 1 



Property 

deliberately 

domoged 



Injured 
you with 
o weapon 2 



Threatened 
you with 
a weapon 2 



Injured you 
without a 



Threatened you 
without 




I __ ^ wiw V.O 7 on o l C i • 

W *' 0Un “ elU0 qu “"°" ' H “ “"»="« you w* Wimout) o weapon 

between ££££' ^ Wtoe - IMMaua atfetences 
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Indicator 50 



Percentage of high school seniors reporting being victimized in school, 

by race/ethnicity: 1976-1991 



With a weapon 

Percent 




Without a weapon 

Percent 




SOURCE: University of Michigan. Survey Research Center. Institute for Social Research. Monitoring the Future, unpublished 
tabulations. 
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QUESTION 3 



HOW CAN WE DISPLAY OUR RESULTS 
HONESTLY? 



(MISLEADING GRAPHS) 



i 



» 



» 
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MISLEADING GRAPHS 



1. FLEXIBLE GRID 

2. IRREGULAR SPACING ON GRID. 

3. AXIS DOESN’T START AT 0. 

NEED SCALE BREAK. 

4. VISUAL AREA AND NUMERICAL MEASURE. 

5. IGNORING THE VISUAL IMAGE. 

6. DOUBLE AXES. 

7. PERSPECTIVE 

8. CHANGE SCALES IN MID-AXIS 

9. EMPHASIZE TRIVIAL, IGNORE IMPORTANT 



3 

ERIC 



50 - 



63 
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1. FLEXIBLE GRID 



CHANGING THE VISUAL IMAGE 



Contracting or expanding vertical (amount) scale or horizontal 
(time) scale tends to change the visuot picture 



original scale 




contracting vertical 





contracting 

horizontal 





CONTRACTING VERTICAL ANO EXPAN0IN6 HORIZONTAL 




EXPANOING VERTICAL 
ANO CONTRACTING 
HORIZONTAL 




FIG. 3-1 Contracting and expanding the grid. 
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2. IRREGULAR SPACING ON GRID. 



SKIPPING THE GRID 

A familiar layout in reports and advertisements is seen in Fig. 
3-2 A. In order to dramatize the story, a little fudging is done with 
t he time scale. It is not noticeable at a casual glance that the time 
sequence is not uniform. It seems to be a neat, clean-cut, see-how- 
we’ve- grown story. Even the dates lettered at right angles to the 
base line make the irregular date plotting less noticeable. 

Chart B in Fig. 3-2 shows what the trend looks like when laid 
out with the correct grid spacing for each year. Amount plottings 
for the given years are the same. Spread out this way is not as 
dramatic, but is the true story. 

Chart G in Fig. 3-12 makes no allowance for the missing years. 



A B 
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. AXIS DOESN’T START AT 0. 
NEED SCALE BREAK. 
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Figure 2. A low density graph (from Friedman and Ratsky 1981 
(ddi = .5]). 



the worse it is. Tufte (1983) has devised a scheme for 
measuring the amount of information in displays, called 
the data density index (ddi). which is “the number of 
numbers plotted per square inch.” This easily calcu- 
lated index is often surprisingly informative. In popular 
and technical media we have found a range from .1 to 
362. This provides us with the first rule of bad data 
display. 



in 2 ). This is unusual for JASA, where the median data 
graph has a ddi of 27. In defense of the producers of this 
plot, the point of the graph is to show that ? method of 
analysis suggested by a critic of their papei was not 
fruitful. I suspect that prose would have worked pretty 
well also. 

Although arguments can be made that high da. 'a den- ^ 
sity does not imply that a graphic will be good, nor one 
with low density bad. it does reflect on the efficiency of 
the transmission of information. Obviously, if we hold 
clarity and accuracy constant, more information is bet- 



Rule 1 — Show os Few Data as Possible (Minimize the 
Data Density) 

What does a data graphic with a ddi of .3 look like? 
Shown in Figure 1 is a graphic from the book Social 
Indicators III (SI3), originally done in four colors (orig- 
inal size T by 9") that contains 18 numbers (18/63 = .3). 
The median data graph in SI3 has a data density of .6 
numbers in 2 : this one is not an unusual choice. Shown in 
Figure 2 is a plot from the article by Friedman and 
Rafskv (1981) with a ddi of .5 (it shows 4 numbers in 8 



Labor Productivity:US.vs Japan 
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4. VISUAL AREA AND NUMERICAL MEASURE. 



Visual Area and Numerical Measure 

Another way to confuse data variation with design variation is to use 
areas to show one-dimensional data: 




R. Satct, Ltf Craphtqucs (Paris* 193-)* 

p. 12. 



And here is the incredible shrinking doctor, with a Lie Factor of 
s 8. not countine the additional exaggeration from the overlaid 
perspective and the incorrect horizontal spacing of the data: 



THE SHRINKING FAMILY DOCTOR 

In California 



Percentage ot Doctors Devoted Solely to Family Practice 




1 : 2.247 mo to raw!** 



Los Angeles Times. August J. 1979. F- 3- 
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change in the value of the dollar from Eisenhower to 
Carter divided by the actual change. I read and measure 
thus: 

Actual Measured 



1.00- .44 ^ j 27 22-00 2 06 _ 96g 



.44 



2.06 

PD = 9.68/1.27 = 7.62 

This distortion of over 700% is substantial but by no 
means a record. 

A less distorted view of these data is provided in 
Figure 10. In addition, the spacing suggested by the 
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Figure 9. An example of how to goose up the effect by squanng 
the eyeball ( ? T978. The Washington Post). ^ 



1963 1968 

YEPR 

Figure TO. The data in Figure 9 as an unadorned line chan (from 
” ; 7 960). 4 



70 



BEST COPT AVAILABLE 





jhe "Trash Cans" question, which was in the Data Analysis, Probability, 
and Statistics content area, required eighth-grade students to examine a 
misleading pictograph and explain why the data display was misleading. To 
receive credit for a correct response, students needed to note that the 1980 can 
would hold more than twice the 1960 can or that both the width and height of the 
can had been doubled. (In particular, doubling the dimensions of the can would 
lead to an eightfold increase in the volume of the can, because doubling the 
radius lor diameter] results in a fourfold increase when the radius is squared in 
vsTcrh.) However, even though the general rather than the specific answer was 
scored correct, student performance at the national level was quite low, with 8 
percent of the eighth-grade students providing an acceptable response. 

The ability to read data from a graph, noting the correctness of the graph 
and the implied comparisons, is an important consumer skill. The ability to 
detect errors of the type presented in this question is an important outcome of the 
data analysis / quan titati ve literacy aspect of the school mathematics curriculum. 
While some students seem to have developed this critical skill, the results indicate 
that the vast majority have little conception of the effects that such visual 
representations can have on the possible interpretations of the data. 



EXAMPLE 6: Data Analysis, Statistics, and Probability 



THE UNITED STATES 
IS PRODUCING MORE TRASH, 




I960 1980 



The pictograph shown above is misleading. Explain whv. , 

Answet: 'QM -Hig. toidAji dk*. hgj&kiz 

k &gJA dsuddlze L. 

Qn(y 4W k a*/*. 

rlcmfc/a id 



» 

Overall Percent Correct 
Grade 6-8 (0.6) 




•The standard errors of the estimated percentages appear in parentheses. 



U.S. trade with China «(hd Taiwan 




ter than less. One of the great ass< 
niques is that they can convey large 
tion in a small space. 

We note that when a graph conti. 
mation the plot can look quite empty (Figure 2) and 
thus raise suspicions in the viewer that there is nothing 
to be communicated. A way to avoid these suspicions is 
to fill up the plot with nondata figurations — what Tufte 
has termed "chartjunk." Figure 3 shows a plot of the 
labor productivity of Japan relative to that of the 
United States. It contains one number for each of three 
years. Obviously, a graph of such sparse information 
would have a lot of blank space, so filling the space 
hides the paucity of information from the reader. 

A convenient measure of the extent to which this 
practice is in use is Tufte's "data*ink ratio." This mea- 
sure is the ratio of the amount of ink used in graphing 
the data to the total amount of ink in the graph. The 
closer to zero this ratio gets, the worse the graph. The 
notion of the data-ink ratio brings us to the second 
principle of bad data display. 



the data density; we can sometimes convince viewers 
that we have included the data through the incorpo- 
ration of chartjunk. Hiding the data can be done either 
by using an overabundance of chartjunk or by cleverly 
choosing the scale so that the data disappear. A mea- 
sure of the success we have achieved in hiding the data 
is through the data-ink ratio. 

3. SHOWING DATA ACCURATELY 

The essence of a graphic display is that a set of num- 
bers having both magnitudes and an order are repre- 
sented by an appropriate visual metaphor — the mag- 
nitude and order of the metaphorical representation 
match the numbers. We can display data badly by ignor- 
ing or distorting this concept. 

Rule 3 — Ignore the Visual Metaphor Altogether 



IGNORING THE VISUAL IMAGE. 

. „ _ ^ Tnnrr- — WWffrWIwmK U1C Udld ullUUUll 

!• VT TlfTTlIlUl *- **■ 



Rule 2 — Hide What Data You Do Show 
(Minimize the Data-ink Ratio) 

One can hide data in a variety of ways. One method 
that occurs with some regularity is hiding the data in the 
grid. The grid is useful for plotting the points, but only 
rarely afterwards. Thus to display data badly, use a fine 
grid and plot the points dimly (see Tufte 1983. 
pp. 94-95 for one repeated version of this). 

A second way to hide the data is in the scale. This 
corresponds to blowing up the scale (i.e. . looking at the 
data from far away) so that any variation in the data is 
obscured by the magnitude of the scale. One can justify 
this practice by appealing to "honesty requires that we 
start the scale at zero." or other sorts of sophistry. 

In Figure 4 is a plot that (from SI3) effectively hides 
the growth of private schools in the scale. A redrawing 
of the number of private schools on a different scale 
conveys the growth that took place during the mid- 
1950‘s (Figure 5). The relationship between this rise and 
Brown vs. Topeka School Board becomes an immediate 
question. 

To conclude this section, we have seen that we can 
J; splav data badlv either bv not includineYhem (Rule 1) 

ERIC 



If the data are ordered and if the visual metaphor has 
a natural order, a bad display will surely emerge if you 
shuffle the relationship. In Figure 6 note that the bar 
labeled 14.1 is longer than the bar labeled 18. Another 
method is to change the meaning of the metaphor in the 
middle of the plot. In Figure 7 the dark shading repre- 
sents imports on one side and exports on the other. This 
is but one of the problems of this graph; more serious 
still is the change of scale. There is also a difference in 
the time scale, but that is minor. A common theme in 
Playfair's (1786) work was the difference between im- 
ports and exports. In Figure 8. a 200-year-old graph 
tells the story clearly. Two such plots would have illus- 
trated the story surrounding this graph quite clearly. 

Rule 4— Only Order Matters 

One frequent trick is to use length as the visual meta- 
phor when area is what is perceived. This was used quite 
effectively by The Washington Post in Figure 9. Note 
that this graph also has a low data density (.1). and its 
data-ink ratio is close to zero. We can also calculate 
Tufte's (1983) measure of perceptual distortion (PD) 
for this graph. The PD in this instance is the perceived 

Vic AmericatyS/ttisncian. Mav IVSJ. 17 >/. 3S. X» 2 
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6. DOUBLE AXES 



Lin* Charts 89 






lat is, those using two or 
9 <e units, are not suitable 
hart is apt to be mislead- 
^ader is not familiar with 

re used, they should start 
iould be taken to identify 
0 mount scale. Both curve 
s uni t identification, that 
In this chart, the vertical 
er to “Population," while 
al Income." 

es, one should be certain 
9 e directly opposite each 
:lude the plotted amount 
> include 69 millions, the 
i. whereas each scale unit 
SO. Such a process divides 
into four equal units (see 
• n zero. 

imponents of a multiple- 
-r. The charts in Fig. 4-13 
lart in black and white or 



. FIG. 4*12 Muttip4*-amount teaks. 
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£4 Practical Charting TtdmiquM 



THE MULTISCALE COMPLEX 

You will come Across numerous charts using two or more scales 
purporting to prove a point. Beware of them. It is too easy to adjust 
the scales to make one trend visually appear greater in amount 
and more important than another trend. 

Figure 3-9 shows that by changing the population scale in the 
chart in Fig. 4-12 the “Personal Income" trend assumes more 
importance. 

Check to see that all scales begin from zero and that there is 
a scale unit relationship (see the discussion of multiple scales in 
Sec. 4). 




FIG. 3-9 Scrutinixt the multiscale chart 
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7. PERSPECTIVE 
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Example 86A Perspective 

Perspective diagrams are hard to interpret. Fig. 86 is supposed to de- 
pict the change in the national debt from about 1860 to the present time* 
This presentation grossly distorts the amplitude of the recent fluctuations. 
The visual impression is that the debt in 1948 is about 10§ times the debt 




of 1920, but the ratio between 1948 and 1920 computed from the debt 
figures is onlv 5). The 1948 figure appears to be about 63 times the 1860 
figure, but actually was only 16 times it. Thus, the chart gives two to four 
times the legitimate impact. The purpose of any chart is to present the facts 
clearly simply. Such a perspective diagram does neither. It is easy to 
suspect that those who use charts that distort may not have a good case. 

Example 86E Deceptive Chances op Scale 

Fig. 87A sketches the general appearance of a misleading senes of 
charts relating to sales of U. S. Government Series E bonds in the period 
1941-1944. It was presented as a model of what “a lively imagination in 
selecting and compressing data" ean do.* 

23. This is the c o ver design need by the Committee os Public Debt Policy for its 
Hatxmel Dtit Strut, anted be twe e n World War II and the Korean War. 

24. J. A. Livingston, “Charts Should Tell A Story,” Jmnul •/ At Amman Sututual 
AtmrUbtn, VoL 40 Cl 945), pp. 342-350. 
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Rule 2 — L/rapn uaia urut oj junior t 

Often we can modify the perception of the graph 
(particularly for time series data) by choosing carefully 
the interval displayed. A precipitous drop can disappear 
if we choose a starring date just after the drop. Simi- 
larly. we can turn slight meanders into sharp changes by 
focusing on a single meander and expanding the scale. 

Often the choice of scale is arbitrary but can have pro- 
found effects on the perception of the display. Figure 11 
shows a famous example in which President Reagan 
gives an out-of-context view of the effects of his tax cut. 

The Times' alternative provides the context for a deeper 
understanding. Simultaneously omitting the context as 
well as any quantitative scale is the key to the practice 
of Ordinal Graphics (see also Rule 4). Automatic rules 
do not always work, and wisdom is always required. 

In Section 3 we discussed three rules for the accurate 
display of data. One can compromise accuracy by ignor- 
ing visual metaphors (Rule 3). by only paving attention 
to the order of the numbers and not their .magnitude 
(Rule 4). or by showing data out of conte> 

distortion as a way of measuring the extent t8. CHANGE SCALES IN MID-AXIS 

accuracy of the data has been compromised 
play. One can think of modifications that wooiu-auow-ir- 
to be applied in other situations, but we leave such 
expansion to other accounts. 



tie (and not so subtle) techniques can be used to-enec- 
tively obscure the most meaningful or interesting as- 
pects of the data. It is more difficult to provide objective 
measures of presentational clarity, but we rely on the 
reader to judge from the examples presented. 

Rule 6— Change Scales in Mid-Axis 

This is a powerful technique that can make large dif- 
ferences look small and make exponential - changes look 
linear. 

In Figure 12 is a graph that supports the associated 
story about the skyrocketing circulation of The New 
York Post compared to the plummeting Daily News 
circulation. The reason given is that New Yorkers 
“trust” the Post. It takes a careful look to note the 
700,000 jump that the scale makes between the two 
lines. 

In Figure 13 is a plot of physicians incomes over 
time. It appears to be linear, with a slight tapering off 
in recent vears. A careful look alth&5cale.sh£UKS-lhaiil — 



4. SHOWING DATA CLEARLY 

In this section we discuss methods for badly dis- 
playing data that do not seem as serious as those de- 



THE NEW YORK TIMES. SUNDAY. AUGUST 2. 1981 



The soaraway Post 
— the daily paper 
New Yorkers trust 
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Figure 11. The White House showing neither scale nor context 
(t 1981. The New York Times, repnnted with permission). 
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Figure 12. Changing scale in mid-axis to make large differences 
small (£ 1981. New York Post). 
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Figure 13. Changing scale in mid-axis to make exponential growth 
linear (E The Washington Post). 

Rule 7— Emphasize the Trivial (Ignore the Important ) ^ 

Sometimes the data that are to be displayed have one ^ f . „ 

, . , w Raure 15. Emphasizing the trivial: Hiding the mam effect or sex 

important aspect and others that are trivial. The graph dWe ^ nC es in income through the vertical placement of plots < from 

can be made worse bv emphasizing the trivial part. In SP| . 

Figure 15 we have” 

emphasize trivial, ignore important 

individuals are pai( 

ones and that chant . 



dollars are reasonably constant. The comparison of 
greatest interest and current concern, comparing sal- 
aries between sexes within education level, must be 
made clumsily by vertically transposing from one graph 
to another. It seems clear that Rule 7 must have been 
operating here, for it would have been easy to place the 
graphs side by side and allow the comparison of interest 
to be made more directly. Looking at the problem from 
a strictly data-analytic point of view, we note that there 
are two large main effects (education and sex) and a 
small time effect. This would have implied a plot that 

INCOMES OF DOCTORS VS. OTHER PR0FESS1 ONRLS 



MEDIAN INCOME OF YEAR-ROUND FULL TIME WORKERS 
25-34 YEARS OLO BY SEX ANO EDUCATIONAL ATTAINMENT: 
1968-1977 (IN CONSTANT 1977 OOLLARS) 
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f 



0 m 
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Figure 14. Data from Figure 13 redone with linear scale (from 
Warner 1980). 
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Figure 16, Figure 15 redone with the large mam effects empha- 
sized and the small one (time trends) suppressed. 
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APPRAISAL OF CLAIMS MADE 
ABOUT DATA 

WHEN TO BELIEVE THEM 
WHEN TO BE SKEPTICAL 
WHEN THEY SHOULD BE IGNORED 



Be skeptical about believing estimates or 
differences associated with: 

1. Large std errors 

2. Wide confidence intervals 

3. Results which are not statistically 

significant 

Not statistically significant does not mean "no 
difference”. 

Statistical significance is not the same as 
substantive significance. 

Correlation does not imply causation. 

Examine graphs carefully. Be skeptical. 



SOME BASIC CONCEPTS OF RESEARCH DESIGN 



- Operationalizing your terms 

E.g. "Motivated to Learn" 

- Selection Bias 

E.g. Magazine Study 
E.g. Teacher Evaluation 

- Need for Control Group 

E.g. Science Major 
E.g. Small Classes 
E.g. Persistence in School 
E.g. NAEP Reading Scores 

- Nonresponse Bias 

E.g. Survey on attitudes toward marriage 

- Confounding 

E.g. Television teaching 
E.g. Public/Private Schools 

- Validity 

E.g. Motivated to learn 
E.g. Urbanicity codes 

- Reliability 

E.g. Urbanicity codes 
E.g. Achievement tests 

- General iz ability /External Validity 

E.g. Head Start 
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KEY PRINCIPLE IN EVALUATING RESEARCH 
CONCLUSIONS : 



WHEN YOU COMPARE TWO GROUPS WHICH DIFFER ON 
SOME CHARACTERISTIC AND FIND THEY DIFFER IN 
OUTCOMES , YOU WANT TO BE ABLE TO CONCLUDE 
THAT THIS CHARACTERISTIC IS PROBABLY 
RESPONSIBLE FOR THE DIFFERENCE. 

TO DO SO, YOU MUST EXAMINE AND RULE OUT 
OTHER COMPETING EXPLANATIONS. 



Operationalizing Terms 



Term = "motivated to learn mathematics 11 



Possible operationalizations : 

1. As shown by enthusiasm in class 

2. As judged by the student's math 
teacher using a rating scale she 
developed . 

3 . As measured by the "math interest" 
questionnaire . 

4 . As shown by attention to math tasks 
in class. 

5. As reflected by achievement in math. 

6. As indicated by records showing 
enrollment in math electives . 

7 . As shown by effort expended in math 
class. 

8 . As indicated by number of optional 
assignments completed. 

9. As demonstrated by reading math 
books outside school . 
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STUDY 



A professor did a study to evaluate student 
opinion of her performance in a large 
lecture course . 

She asks all students who come to her 
office hours during a three-week period in 
the middle of the semester to fill out her 
questionnaire . 

The students give her high marks for 
accessibility , openness, and willngness to 
talk to students . 



WHAT'S THE PROBLEM? 

Selection Bias : 

This is a "convenience sample" , not a 
random sample. Students who come to the 
professor's office have already decided she 
is accessible. By involving only these 
students , she is stacking the deck in her 
favor . 

Selection bias refers to factors introduced 
into the selection of the study population 
that predetermine the outcome of the study. 



Light, Willett, Singer 



STUDY 



A faculty member at a highly selective 
college was distressed to discover that 
nearly a third of students who entered his 
school as science majors switched to other 
fields before graduation . The * colleague 
decided this dropout rate was too high and 
deserved immediate corrective action. He 
thought it reflected inadequacies in the 
science program, so he encouraged a 
curriculum reform committee to consider 
changes that might improve persistence. 



It was later discovered that, in fact, this 
dropout rate was actually much lower than 
the rates at almost any similar school. 

Many felt that this college's program 
indeed may have been exemplary. 



WHAT'S THE PROBLEM? 



Lack of a control/comparison group. 



Light, Singer, Willett 
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Understanding Relationships: 
Using Comparisons 

What is a comparison group? 

Why do you need a comparison group? 
What is an appropriate comparison group? 



Understanding Relationships: 

Using Comparisons-cont. 

• Why do you need a comparison group? 

A comparison group provides a standard by which to 
judge your results. Without a comparison group, you 
CANNOT rule out rival explanations of the results you 
observe. 

Example 1 -Teacher satisfaction with small classes 

In this hypothetical example, a researcher found that 
over 90 percent of teachers in elementary classroom 
with fewer than 15 students were "highly satisfied" 
with their teaching assignments. She recommended 
that elementary school uniformly adopt smaller class 
size, regardless of the expense. 

Example 2-Dropout rates from science courses 

A researcher found that nearly one-third of students 
who entered a highly selective college as science 
majors switched to other fields before graduation. 

He recommended that a curriculum committee 
consider changes that might improve persistence. 
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Understanding Relationships: 
Using Comparisons-cont. 



What is a comparison group? 

A comparison group defines the interpretation of the 
result that you are reporting. It establishes the 
baseline against which research results are judged. 

Example 1-Trends in Reading Profiency 

The Condition of Education, 1993 reports Trends 
in Reading Proficiency using three kinds of 
comparisons-historical comparisons, matched 
group comparisons, and comparisons against a 
standard. 

Average reading proficiency has increased for 17 
year olds since 1971, but not for 9 and 13 year 
olds. 

The gap between the reading proficiency of black 
and white 13 and 17 year olds has narrowed 
since 1971. 

On average, 9 year olds do not demonstrate 
reading proficiency at the level where they can 
interrelate ideas and make generalizations, 
(anchor point) 



86 
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Achievement Attainment and Curriculum 

Trends in the reading proficiency of 9-, 13-, and 17-year-olds 

197? i for lT^eS^Id^ 8 pr ° fidenc £ f " ?' md ^year-olds was the same in 1990 as in 
1971, for 17-year-olds it was somewhat higher in 1990 than in 1971. 

► Average reading proficiency of black 

students at all three ages was higher in 
1990 than in 1971. 



Hispanic 17-year-olds were reading 
better in 1990 than in 1975. 

Between 1971 and 1988, 13- and 17-year- 
old blacks narrowed gaps between their 
reading proficiency scores and those of 
their white counterparts. Similarly, 
between 1975 and 1988, 17-year-old 



Reading skills are basic to the educational 
process. When students fall behind In their 
reading proficiency, they may find It difficult 
to benefit from other aspects of the 
curriculum. In the futise, poor readers may 
also find It difficult to participate effectively In 
an economy requiring Increasingly 

sophisticated job skills. 




f aa a* || ^ wwiii viu id or ivv rrum I infU. 

to? a fSJJtoS 0to 6 500 H1SPanlCS ' StGtts,,caUy s* 9 nificant difference from 1975 for Hispanics. 
Level 150: Simple discrete reading tasks 
Level 200: Partial skills and understanding 
Level 250: Interrelate ideas, and make generalizations 
Level 300: Understands relatively complicated Informcrtlon 
Level 350: Learns from specialized reading materials 
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Average reading proficiency, by age and race/ethnicity: 1971-1990 



9-year-olds 

Scale score 




13-vear-olds 

Scale score 







17-year-olds 

Scale score 
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SOURCE: National Assessment of Educational Progress. Trends m Academic Progress: Achievement of American Students in 
Science. 1969-70 to 1990. Mathematics. 1973 to 1990. Reading. 1971 to 1990. and WHttng. 1984 to 1990. 1991 . 
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Understanding Relationships: 
Using Comparisons-cont. 



Example 2-Persistence in School 

The Condition also reports on students' 
persistence in school using both historical and 
group comparisons. 

Persistence rates among college students 
increased between 1972 and 1991. 

The high school persistence rate for students 
from high income families is about 10 percent 
higher than the rate for students from iow 
income families. 



/Access, Parlicipalion , and Progress 



Persistence in school 

► Between 1990 and 1991, 96 percent of 
15- to 24-year-olds in grades 10-12 
stayed in school or completed high 
school. The other side of this statement 
is that 4 percent dropped out before 
completion (although some of these 
dropouts may have re-enrolled during a 
subsequent school year). 

► The high school persistence rate for 
students from high income f amil ies is 
about 10 percent higher than the rate 
for students from low income families. 

The difference in rates between 
students from high and middle income 
families is small, about 3 percent (see 
supplemental table 5-2). 

► In October 1991, 84 percent of college students who had been enrolled in their first, 
second, or third year of college the previous October were still enrolled. 

► Persistence rates among college students at each level increased between 1972 and 1991 
(supplemental table 5-4). 



A measure of persistent attendance is the 
proportion of students enrolled in 2 
consecutive years. Students who do not 
complete high school face a decreased 
opportunity for assuming a successful and 
fully functional place in the American 
workplace and society at large. Persistent 
attendance is strongly associated with 
completing high school. In college, tooth 
persistent attendance and full-time 
attendance are strongly associated with 
completion of a 4-year degree. Those who 
attend part-time or stop out (i.e.. have 
periods of nonattendance) are less likely to 
complete a degree. 



Percentage of high school and college students enrolled the previous October who are 
enrolled again the following October: 1972-1991 



October 



High school students. College students. 

grades 10-12, ages 15-24 lst-3rd years, ages 16-24 

White Black Hispanic Total White Black 



1972 


93.9 


1973 


937 


1974 


93.3 


1975 


94.2 


1976 


94.1 


1977 


93.5 


1978 


93.3 


1979 


93.3 


1980 


93.9 


1981 


94.1 


1982 


94.5 


1983 


94.8 


1984 


94.9 


1985 


94.8 


1986 


95.3 


1987 


95.9 


1988 


95.2 


1989 


95.5 


1990 


96.0 


1991 


96.0 



94.7 


90.5 


88.8 


94.5 


90.1 


90.0 


94.2 


88.4 


90.1 


95.0 


91.3 


89.1 


94.4 


92.6 


92.7 


93.9 


91.4 


927 


94.2 


89.8 


87.7 


94.0 


90.1 


90.2 


94.8 


91.8 


88.3 


95.2 


90.3 


89.3 


95.3 


92.2 


90.8 


95.6 


93.0 


89.9 


95.6 


94.3 


88.9 


95.7 


92.2 


90.2 


96.3 


94.6 


88.1 


96.5 


93.6 


94.6 


95.8 


94.1 


89.6 


96.5 


927 


92.2 


96.7 


95.0 


92.1 


96.8 


94.0 


92.7 



77.7 


78: 1 


71.3 


76.7 


76.8 


77.2 


77.5 


77.4 


74.3 


79.3 


79.9 


77.0 


79.2 


79.3 


81.3 


797 


79.3 


79.1 


77.7 


77.8 


75.3 


77.8 


78.4 


73.6 


79.0 


607 


71.0 


78.0 


79.4 


773 


80.4 


817 


74.6 


80.3 


81.1 


74.8 


79.1 


79.8 


74.2 


79.7 


81.0 


71.4 


80.2 


80.5 


74.4 


81.3 


879 


69.6 


83.0 


83.7 


78.0 


83.8 


84.3 


79.0 


81.8 


81.7 


79.4 


84.1 


84.4 


77.8 



Hispanic 

78.1 

73.8 

76.0 

72.8 

74.9 

75.9 

76.7 

72.4 

69.2 

72.5 

77.4 

74.4 

72.8 

67.7 

81.7 

74.9 

77.0 

81.1 

79.7 

80.8 



NOTE: High school students were either enrolled again the following October or had graduated. See supplemental note to 
Indicator 4 for details on how the persistence rates in this table are calculated. Not shown separately but included in the totat 
are non-Hispanics who are neither black nor white. Data for 1987 through 1991 reflect new editing procedures instituted by the 
Bureau of the Census for cases involving missing school enrollment items. 



SOURCE: U.S. Department of Commerce. Bureau of the Census. October Current Population Surveys. 
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Indicator A 



Percentage of high school students in grades 10-12 and from ages 15-24 enrolled 
in the previous October and again the following October*: 1972-1991 



By race/ethnicity 

Percent Black 




By family income 

Percent 




* Or who hod completed high school 

NOTE: Low income is defined as the bottom 20 percent of all family incomes; high Income is defined as the top 20 percent 
of all family incomes; and middle income is defined as the 60 percent of family incomes between high and low incomes. 
SOURCE; US. Department of Commerce. Bureau of the Census. October Current Popiiation Surveys. 
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STUDY 



In 1987, author and social investigator Shere Hite 
published her third book on men and women. Her latest 
findings on women's attitudes about men, sex, and 
personal and marital relationships put her on the cover 
of Time and lauched a flood of news stories and TV 
talk. 

100,000 detailed questionnaires, 

127 questions 

women in groups of many kinds all over the country 

4500 replies 
Report 

84 percent of the women in her study were 

dissatisfied with their marital or other intimate 
relationships , 

78 percent said they were generally not treated as 
equals by men, 

70% of those married more than five years had had 
affairs . 

And so on, with a number of answers and Hite's 
elaborations indicating that women in general are 
mainly unhappy with their relationships. 



WHAT'S THE PROBLEM? 



NONRESPONSE BIAS 

Women in general? At one point, she said "no one can 
genralize" from her findings. Yet, she also claimed 
that her respondents were typical. 

Critics said her sample was almost certainly heavily 
weighted with the unhappiest women, those who took the 
time to answer the lengthy questinnaire. Many women 
probabably feel the same way - but we have no idea how 
many. 

Washinton Post - ABC polling team questioned by phone 
representative sample of women and men across the 
nation. They found that 

93 percent of the married and single women said 
they were satisfied with their relationships, 

81% said they were treated as equals most of the 
time, 

only 7 percent reported affairs. 
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STUDY 



A study is done to compare the 
effectiveness of televised instruction 
versus regular classroom instruction. 
Students were randomly assigned to one of 
the two groups. At the end of the course, 
the investigator compared the progress of 
students in the two groups, found students 
in the television group performed better, 
and concluded that the television approach 
was more effective. 



POINTS TO CONSIDER 

Confounder - a factor which differs between 
the treatment and control group and is 
likely to affect the outcome. 

Confounder here is the quality of the 
teacher. When this type of research was 
done, the usual procedure was to select the 
best teacher available and give this person 
the full day to prepare the lesson. 

Better controlled studies found no 
difference between the two groups. 



feu RE 12.1 

Illustration of Threats to Internal Validity 



“Hold on — perhaps 
f private schools are more likely to expel 
l the poorer students. So it’s this policy, not 
\riie nature of the school, that makes the j J 
^ ^ difference." 



“Maybe those 

f attending private schools come 
[ from more affluent homes — so it is 
not the type of school that makes the \j 
difference." 



r “Wait a minute. 

Private schools may have more 
resources (materials, technologies, 
parent support), that could account 
for the diff e r en ces ins te ad of the 
type of school organization/ 



The teachers in this fictitious 
example are discussing the results 
of a study which show that students 
who attend private high schools had 
higher achievement (as shown by test 
scores) than students who attend 
public hi gh schools. 



“Private school students 
'may achieve higher scores, not because of 
the type of school, but because they are 
exposed to a broader range of experie nc es. 
Their parents are more affluent/ 



“Maybe private school 
'students have more opportunities to 
practice taking such tests. This could 
account for their higher 
performance/ 



(Subject 

Characteristics) 



(Loss of 
Subjects) 



(Location) 



“Is it likely that the 
tests used to assess achievement are 
f biased in favor of the curricula found in 
private schools? Could the procedures 
, used in testing favor the private school 
students (testing conditions, 
adherence to instructions)?" 




( Instrumcntatio n ) 



(Testing) (History) 



“Perhaps it is the status and 
self-esteem associated with attending a \ 
(private school that motivates these students to ) 
chi eve at a higher level, rather than the type/ 
of school organization." 

v 




‘Maybe there were a lot 
’ of students who scored 
really low on the pre-test 
in the private schools/ 




“Perhaps private 
f school students spend more years 
i high school than those in public 
schools"* 




“Maybe private 

schools have more experienced or 
dedicated teachers and this is the reason 
for the difference "t 



(Maturation) (Attitude (Regression) (Implementation) 
of Subjects) 



Note: Wc are not implying that any of these statements are necessarily true; our guess is that some are and some arc not. 
•This seems unlikely. 

Tf these teacher characteristics are a result of the type of school, then they do not constitute a threat. 







VALIDITY AND RELIABILITY 




valid and reliable 




not valid, reliable 




valid, not reliable 




not valid, not reliable 



reliability: the reproducibility of a result when 

a test or study is repeated 
validity: how well a measure actually assesses 

what you want it to. 



Validity and Reliability-cont. 

Example: SASS Community Type 
VALIDITY: 

Is the community that the school is located in 
really the level of urbanidty that the principal 
reports? (Example of Fairfax City Schools) 

RELIABILITY: 

If you were to readminister the questionnaire 
tomorrow, would the principal respond 
differently? 



Locale Codes (columns) Versus Self-Report (rows) 





Large 

City 


Mid- 

size 

City 


Large 

City 

Fringe 


Midsize 

City 

Fringe 


Large 

Town 


Small 

Town 


Rural 


Small 

City 


.27% 


7.3% 


6.7% 


11.8% 


7.8% 


51.1% 


15.0% 


Suburb 
of Med. 
City 


.50% 


20.8% 


16.9% 


34.5% 


2.2% 


11.9% 


13.2% 



Collapsed Locale codes (columns) Versus Self Report (rows) 





Urban 


Suburban 


Small Town/Rural 


Small 

City 


7.6% 


26.3% 


66.1% 


Suburb 
of Med. 
City 


21.3% 


53.6% 


25.1% 



Understanding Relationships: 
Generalizability 

Defines the target population of the research 

Are the results applicable to a broad target 
population or are they too specific to a particular set 
of places, person, and times to be useful for general 
policy making? 

Narrow target populations mean less generalizability, 
but may mean more ability to detect effects 

Example: Shy Females Study 

Broad target populations mean more generalizability, 
but may be less feasible 

Example: Introductory Psychology 

KEY ISSUE-Don't generalize beyond what your 
sample allows! 
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Understanding Relationships: 
General izabi I ity-cont. 

What are the pitfalls of overgeneralization? 

Example-Head Start Study 

Many policy discussions about the efficacy of Head Start 
and decisions about funding of Head Start have been 
based upon a study conducted in Ypsilanti, Michigan of a 
model Head Start program. 

What were the characteristics of the program? 

How many children were in the study? 

What were the results of the study? 

How have they been used? 



A MORE IN-DEPTH LOOK AT ONE EXAMPLE.: 

THE CASE OF RESEARCH IN BILINGUAL EDUCATION 



References : 

WillAg, "A Meta-Analysis of Selected Studies on the Effectiveness of 
Bilingual Education" ,RER, 1985, Vol 55, No. 3 

Keith Baker, "Comment on Willig's "A Meta Analysis of Selected Studies on the 
Effectiveness of Bilingual Education", KER, 1987, Vol. 57, No. 3 

Ann Willig, Response to Baker, HER, 1987, Vol. 57, No. 3 

WHAT IS THE OUTCOME VARIABLE? 

- Different interpretations of what constitutes 
success . 

- Successful as long as it does not hinder children 
in the learning of English while it promotes 
learning of the nonlanguage subjects . 

- Successful if it improves achievement in school. 

- Successful if the children can be taught in the 
second language and still maintain grade level in 
nonlanguage subjects. 

- Successful if it accelerates children's learning 
of English over what it would have been without the 
program. 

WHAT GROUPS ARE BEING COMPARED (TREATMENT/ CONTROL) ? 

WHAT ARE THE CHARACTERISTICS OF THE CHILDREN BEING 
STUDIED AND THEIR COMMUNITIES? 

RESEARCH STRATEGIES 
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PROBLEMS 



LACK OF RANDOM ASSIGNMENT LEADING TO PROBLEMS WITH 
CONFOUNDING FACTORS 

- Uncontrolled differences between the experimental and 
control groups when random assignment is not used which 
contribute significantly to the results. 

CONFOUNDING FACTORS: 

When random assignment was not used, bilingual students 
differed from control students in several ways: 

(a) in language dominance and/or their need for a 
bilingual program. 

When both groups were Spanish dominant, there 
is an effect of almost one half of a std dev 
favoring the experimental group. 

When the experimental group was Spanish 
dominant and the comparison group was English 
dominant, there is little or no difference 
between the groups. 

When both groups were English dominant, there 
is little difference. 

(b) Some comparison groups contained students who 
were not qualified for a bilingual program, 
were not deemed limited English proficient. 

(c) some comparison groups contained students who 
had exited from bilingual programs. These 

studies tended to show no benefit for the 
bilingual group. 

(d) some comparison groups contained schools having 
no bilingual program. It is most likely that 
in these schools there is an insufficient 
number of non-English speaking children in the 
attendance center. Children in such schools 
tend to be exposed to more English from their 
peers, teachers, and neighbors. 
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Generally/ when one has a nonrandom! zed study and is 
concerned about the influence of possible extraneous 
variables, one tries to adjust statistically for these 
differences. Many researchers believe, however, that 
in program evaluation research, such adjustments will 
be underadjustments and will make the program look less 
effective than it really is. 

PROBLEMS WITH THE MAINTENANCE OF DEFINITION OF 
TREATMENT AND CONTROL GROUPS 

In addition, treatment and control programs failed to 
maintain their unique identity 

(a) some treatment groups changed in composition 
such that, subsequent to the pretest and prior 
to the posttest, the better students exited and 
more needy students entered. 

(b) stability of the treatment program (e.g. 
teacher turnover, reorganization of program) 

(c) some comparison programs contained elements of 
bilingual programs such as bilingual teachers 
or aides who had previously taught in bilingual 
programs . 



PROBLEMS WITH THE RELIABILITY AND VALIDITY OF THE 
LANGUAGE TESTS 

Many claim that the language tests used to determine 
entry into bilingual programs have low reliability and 
validity. Individuals possess a variety of language 
skills and competence and performance will vary 
depending on the context or setting of language use, 
the interactants, their relationships and relative 
statuses, the domain of the communicative intent, and 
the topic. 
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"FINE 11 ANALYSIS CRITERIA FOR 
QUANTITATIVE STUDIES 



ft 



ft 



I . introduction to Problem 

A. Stated problem clear and researchable? 

B. Thorough review of literature 

C. Clear hypotheses/research questions 

II. Research Procedures 

A. Representative sample 

1 . Characteristics of sample described 

2 . Did sample selection methods produce 
unbiased sample? 

4 . Numbers of participating and 
nonparticipating given 

5. Sample size large enough? 

B. Data Gathering Techniques 
4 . Validity/reliability 

C. Research design and procedures 
appropriate/ replicative 

1. Research design appropriate for question 

2 . Procedures described 

3. Research design eliminated confounders 



III. Discussion 

p A. Results appropriate and clear 

1. Statistical techniques appropriate 

2. Results presently clearly 

3. Levels of significance and degrees of 
freedom 

- 4. Graphs and tables discussed 

5. Every hypothesis tested. 

B. Results of analysis support conclusions 

4 . Limitations of study discussed 

C. Recommendations for future action 

* IV. Method Specific Criteria 

A. Surveys /Questionnaires 

B . Correlational Studies 

C. Causal -Comparative Studies 
E.g. SES and GPA 

ft 2 . Extraneous variables identified and 

controlled 

3 . Caution in causal statements 

• ERIC 5 03 



4 . Alternative hypotheses discussed 
Experimental Studies 

1. Group formation methods described 

2 . Participants selected randomly 

3 . Random assignment 

4. Extraneous variables identified 

5. Control for extraneous variables 
Quasi -Experimental Studies 

1. Groups compared such that relatively 
similar 

2 . Extraneous variables controlled 

3 . Caution in causal statements 



J. 
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SOME QUESTIONS TO ASK 
(Victor Cohn) 

How do you know? 

Are there studies supporting the claims? 

Were the studies acceptable ones, by general 
agreement? 

Were there enough people in the study? 

Were appropriate control groups used? 

Was the sample studied representative of the 
population? 

Have results been fairly consistent from study to 
study? 

Do the results hold across subgroups or only for 
particular subpopulations? 

If the results are based on questionnaires, were 
the questions likely to elicit accurate, reliable 
answers? 

What was the response rate? Were the 
nonrespondents different from the respondents? 

Do you have a conclusion or suggestion for further 
study? 
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Are there other possible explanations for the 
differences or relationships you are seeing? 

Have the findings resulted in consensus among 
others in the same field? Do at least the majority 
of informed persons agree? Or should we withhold 
judgment until there is more evidence? 

ARE THE CONCLUSIONS BACKED BY 
BELIEVABLE STATISTICAL EVIDENCE? 

What is the degree of uncertainty? How sure can 
you be? Could these results have occurred by 
chance? 

To whom do the results apply? Who can you 
generalize to? 

Did the investigator frankly discuss possible biases 
or flaws in the study? 

Have the results been reviewed by unbiased 
parties? 

Do the results make sense? 
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SOME SLIPPERY STATISTICS 
(Nancy Spruill, Post) 

1. The Everything's Up Statistic 

Uses numbers rather than rates. 

2. The Best Foot Statistic 

Choose what fits your story: median vs mean; 

year of comparison 

3. The Half Truth Statistic 

Statistic based on special subgroup 

4. Anecdote statistic 

5. Everyone is averge statistic 

6. Coincidence statistic 

7. Meaningless statistic: e.g. "overall cleanliness of 

NY streets up from 56 to 85 % in last 5 years" 

8. Unknowable statistic 







Listing of NCES Working Papers to Date 

Please contact Ruth R. Harris at (202) 219-1831 
if you are interested in any of the following papers 



Number 


Title 


Contact 


94-01 (July) 


Schools and Staffing Survey (SASS) Papers Presented 
at Meetings of the American Statistical Association 


Dan Kasprzyk 


94-02 (July) 


Generalized Variance Estimate for Schools and 
Staffing Survey (SASS) 


Dan Kasprzyk 


94-03 (July) 


1991 Schools and Staffing Survey (SASS) Reinterview 
Response Variance Report 


Dan Kasprzyk 


94-04 (July) 


The Accuracy of Teachers' Self-reports on their 
Postsecondary Education: Teacher Transcript Study, 
Schools and Staffing Survey 


Dan Kasprzyk 


94-05 (July) 


Cost-of-Education Differentials Across the States 


William Fowler 


94-06 (July) 


Six Papers on Teachers from the 1990-91 Schools and 
Staffing Survey and Other Related Surveys 


Dan Kasprzyk 


94-07 (Nov.) 


Data Comparability and Public Policy: New Interest in 
Public Library Data Papers Presented at Meetings of 
the American Statistical Association 


Carrol Kindel 


95-01 (Jan.) 


Schools and Staffing Survey: 1994 Papers Presented at 
the 1994 Meeting of the American Statistical 
Association 


Dan Kasprzyk 


95-02 (Jan.) 


QED Estimates of the 1990-91 Schools and Staffing 
Survey: Deriving and Comparing QED School 
Estimates with CCD Estimates 


Dan Kasprzyk 


95-03 (Jan.) 


Schools and Staffing Survey: 1990-91 SASS Cross- 
Questionnaire Analysis 


Dan Kasprzyk 


95-04 (Jan.) 


National Education Longitudinal Study of 1988: 
Second Follow-up Questionnaire Content Areas and 
Research Issues 


Jeffrey Owings 


95-05 (Jan.) 


National Education Longitudinal Study of 1988: 
Conducting Trend Analyses of NLS-72, HS&B, and 
NELS:88 Seniors 


Jeffrey Owings 
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Number 


Title 


Contact 


95-06 (Jan.) 


National Education Longitudinal Study of 1988: 
Conducting Cross-Cohort Comparisons Using HS&B, 
NAEP, and NELS:88 Academic Transcript Data 


Jeffrey Owings 


95-07 (Jan.) 


National Education Longitudinal Study of 1988: 
Conducting Trend Analyses HS&B and NELS:88 
Sophomore Cohort Dropouts 


Jeffrey Owings 


95-08 (Feb.) 


CCD Adjustment to the 1990-91 SASS: A Comparison 
of Estimates 


Dan Kasprzyk 


95-09 (Feb.) 


The Results of the 1993 Teacher List Validation Study 
(TLVS) 


Dan Kasprzyk 


95-10 (Feb.) 


The Results of the 1991-92 Teacher Follow-up Survey 
(TFS) Reinterview and Extensive Reconciliation 


Dan Kasprzyk 


95-11 (Mar.) 


Measuring Instruction, Curriculum Content, and 
Instructional Resources: The Status of Recent Work 


Sharon Bobbitt & 
John Ralph 


95-12 (Mar.) 


Rural Education Data User's Guide 


Samuel Peng 


95-13 (Mar.) 


Assessing Students with Disabilities and Limited 
English Proficiency 


James Houser 


95-14 (Mar.) 


Empirical Evaluation of Social, Psychological, & 
Educational Construct Variables Used in NCES 
Surveys 


Samuel Peng 


95-15 (Apr.) 


Classroom Instructional Processes: A Review of 
Existing Measurement Approaches and Their 
Applicability for the Teacher Follow-up Survey 


Sharon Bobbitt 


95-16 (Apr.) 


Intersurvey Consistency in NCES Private School 
Surveys 


Steven Kaufman 


95-17 (May) 


Estimates of Expenditures for Private K-12 Schools 


Stephen 

Broughman 


95-18 (Nov.) 


An Agenda for Research on Teachers and Schools: 
Revisiting NCES' Schools and Staffing Survey 


Dan Kasprzyk 


96-01 (Jan.) 


Methodological Issues in the Study of Teachers' 
Careers: Critical Features of a Truly Longitudinal 


Dan Kasprzyk 



Study 
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96-02 (Feb.) 


Schools and Staffing Survey (SASS): 1995 Selected 
papers presented at the 1 995 Meeting of the American 
Statistical Association 


Dan Kasprzyk 


96-03 (Feb.) 


National Education Longitudinal Study of 1988 
(NELS:88) Research Framework and Issues 


Jeffrey Owings 


96-04 (Feb.) 


Census Mapping Project/School District Data Book 


Tai Phan 


96-05 (Feb.) 


Cognitive Research on the Teacher Listing Form for 
the Schools and Staffing Survey 


Dan Kasprzyk 


96-06 (Mar.) 


The Schools and Staffing Survey (SASS) for 1998-99: 
Design Recommendations to Inform Broad Education 
Policy 


Dan Kasprzyk 


96-07 (Mar.) 


Should SASS Measure Instructional Processes and 
Teacher Effectiveness? 


Dan Kasprzyk 


96-08 (Apr.) 


How Accurate are Teacher Judgments of Students' 
Academic Performance? 


Jerry West 


96-09 (Apr.) 


Making Data Relevant for Policy Discussions: 
Redesigning the School Administrator Questionnaire 
for the 1998-99 SASS 


Dan Kasprzyk 


96-10 (Apr.) 


1998-99 Schools and Staffing Survey: Issues Related to 
Survey Depth 


Dan Kasprzyk 


96-11 (June) 


Towards an Organizational Database on America's 
Schools: A Proposal for the Future of SASS, with 
comments on School Reform, Governance, and Finance 


Dan Kasprzyk 


96-12 (June) 


Predictors of Retention, Transfer, and Attrition of 
Special and General Education Teachers: Data from the 
1989 Teacher Followup Survey 


Dan Kasprzyk 


96-13 (June) 


Estimation of Response Bias in the NHES:95 Adult 
Education Survey 


Steven Kaufman 


96-14 (June) 


The 1995 National Household Education Survey: 
Reinterview Results for the Adult Education 
Component 


Steven Kaufman 
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96-15 (June) 


Nested Structures: District-Level Data in the Schools 
and Staffing Survey 


Dan Kasprzyk 


96-16 (June) 


Strategies for Collecting Finance Data from Private 
Schools 


Stephen 

Broughman 


96-17 (July) 


National Postsecondary Student Aid Study: 1996 Field 
Test Methodology Report 


Andrew G. 
Malizio 


96-18 (Aug.) 


Assessment of Social Competence, Adaptive 
Behaviors, and Approaches to Learning with Young 
Children 


Jerry West 


96-19 (Oct.) 


Assessment and Analysis of School-Level 
Expenditures 


William Fowler 


96-20 (Oct.) 


1991 National Household Education Survey 
(NHES:91) Questionnaires: Screener, Early Childhood 
Education, and Adult Education 


Kathryn Chandler 


96-21 (Oct.) 


1993 National Household Education Survey 
(NHES:93) Questionnaires: Screener, School 
Readiness, and School Safety and Discipline 


Kathryn Chandler 


96-22 (Oct.) 


1995 National Household Education Survey 
(NHES:95) Questionnaires: Screener, Early Childhood 
Program Participation, and Adult Education 


Kathryn Chandler 


96-23 (Oct.) 


Linking Student Data to SASS: Why, When, How 


Dan Kasprzyk 


96-24 (Oct.) 


National Assessments of Teacher Quality 


Dan Kasprzyk 


96-25 (Oct.) 


Measures of Inservice Professional Development: 
Suggested Items for the 1998-1999 Schools and 
Staffing Survey 


Dan Kasprzyk 


96-26 (Nov.) 


Improving the Coverage of Private Elementary- 
Secondary Schools 


Steven Kaufman 


96-27 (Nov.) 


Intersurvey Consistency in NCES Private School 
Surveys for 1 993-94 


Steven Kaufman 
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Number 
96-28 (Nov.) 

96-29 (Nov.) 

96- 30 (Dec.) 

97- 01 (Feb.) 

97-02 (Feb.) 
97-03 (Feb.) 

97-04 (Feb.) 
97-05 (Feb.) 
97-06 (Feb.) 
97-07 (Mar.) 
97-08 (Mar.) 



Title 

Student Learning, Teaching Quality, and Professional 
Development: Theoretical Linkages, Current 
Measurement, and Recommendations for Future Data 
Collection 

Undercoverage Bias in Estimates of Characteristics of 
Adults and 0- to 2- Year-Olds in the 1995 National 
Household Education Survey (NHES:95) 

Comparison of Estimates from the 1995 National 
Household Education Survey (NHES:95) 

Selected Papers on Education Surveys: Papers 
Presented at the 1996 Meeting of the American 
Statistical Association 

Telephone Coverage Bias and Recorded Interviews in 
the 1993 National Household Education Survey 
(NHES:93) 

1991 and 1995 National Household Education Survey 
Questionnaires: NHES:91 Screener, NHES.91 Adult 
Education, NHES:95 Basic Screener, and NHES:95 
Adult Education 

Design, Data Collection, Monitoring, Interview 
Administration Time, and Data Editing in the 1993 
National Household Education Survey (NHES:93) 

Unit and Item Response, Weighting, and Imputation 
Procedures in the 1993 National Household Education 
Survey (NHES:93) 

Unit and Item Response, Weighting, and Imputation 
Procedures in the 1995 National Household Education 
Survey (NHES:95) 

The Determinants of Per-Pupil Expenditures in Private 
Elementary and Secondary Schools: An Exploratory 
Analysis 

Design, Data Collection, Interview Timing, and Data 
Editing in the 1 995 National Household Education 
Survey 



Contact 

Mary Rollefson 

Kathryn Chandler 

Kathryn Chandler 
Dan Kasprzyk 

Kathryn Chandler 

Kathryn Chandler 

Kathryn Chandler 

Kathryn Chandler 

Kathryn Chandler 

Stephen 

Broughman 

Kathryn Chandler 
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97-09 (Apr.) 


Status of Data on Crime and Violence in Schools: Final 
Report 


Lee Hoffman 


97-10 (Apr.) 


Report of Cognitive Research on the Public and Private 
School Teacher Questionnaires for the Schools and 
Staffing Survey 1993-94 School Year 


Dan Kasprzyk 


97-11 (Apr.) 


International Comparisons of Inservice Professional 
Development 


Dan Kasprzyk 
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Lee Hoffman 
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Review of the Literature 
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Never Understand 


Susan Ahmed 
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